6,214 research outputs found

    Diversifying Top-K Results

    Full text link
    Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2,000.Comment: VLDB201

    Evolution of cooperation in spatial traveler's dilemma game

    Full text link
    Traveler's dilemma (TD) is one of social dilemmas which has been well studied in the economics community, but it is attracted little attention in the physics community. The TD game is a two-person game. Each player can select an integer value between RR and MM (R<MR < M) as a pure strategy. If both of them select the same value, the payoff to them will be that value. If the players select different values, say ii and jj (R≀i<j≀MR \le i < j \le M), then the payoff to the player who chooses the small value will be i+Ri+R and the payoff to the other player will be iβˆ’Ri-R. We term the player who selects a large value as the cooperator, and the one who chooses a small value as the defector. The reason is that if both of them select large values, it will result in a large total payoff. The Nash equilibrium of the TD game is to choose the smallest value RR. However, in previous behavioral studies, players in TD game typically select values that are much larger than RR, and the average selected value exhibits an inverse relationship with RR. To explain such anomalous behavior, in this paper, we study the evolution of cooperation in spatial traveler's dilemma game where the players are located on a square lattice and each player plays TD games with his neighbors. Players in our model can adopt their neighbors' strategies following two standard models of spatial game dynamics. Monte-Carlo simulation is applied to our model, and the results show that the cooperation level of the system, which is proportional to the average value of the strategies, decreases with increasing RR until RR is greater than the threshold where cooperation vanishes. Our findings indicate that spatial reciprocity promotes the evolution of cooperation in TD game and the spatial TD game model can interpret the anomalous behavior observed in previous behavioral experiments

    Keyword Search on RDF Graphs - A Query Graph Assembly Approach

    Full text link
    Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent user's query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Then, we formally define the query graph assembly (QGA) problem. Unfortunately, we prove theoretically that QGA is a NP-complete problem. In order to solve that, we design some heuristic lower bounds and propose a bipartite graph matching-based best-first search algorithm. The algorithm's time complexity is O(k2lβ‹…l3l)O(k^{2l} \cdot l^{3l}), where ll is the number of the keywords and kk is a tunable parameter, i.e., the maximum number of candidate entity/class vertices and predicate edges allowed to match each keyword. Although QGA is intractable, both ll and kk are small in practice. Furthermore, the algorithm's time complexity does not depend on the RDF graph size, which guarantees the good scalability of our system in large RDF graphs. Experiments on DBpedia and Freebase confirm the superiority of our system on both effectiveness and efficiency

    Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data

    Full text link
    The probabilistic threshold query is one of the most common queries in uncertain databases, where a result satisfying the query must be also with probability meeting the threshold requirement. In this paper, we investigate probabilistic threshold keyword queries (PrTKQ) over XML data, which is not studied before. We first introduce the notion of quasi-SLCA and use it to represent results for a PrTKQ with the consideration of possible world semantics. Then we design a probabilistic inverted (PI) index that can be used to quickly return the qualified answers and filter out the unqualified ones based on our proposed lower/upper bounds. After that, we propose two efficient and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To accelerate the performance of algorithms, we also utilize probability density function. An empirical study using real and synthetic data sets has verified the effectiveness and the efficiency of our approaches

    A Fast Order-Based Approach for Core Maintenance

    Full text link
    Graphs have been widely used in many applications such as social networks, collaboration networks, and biological networks. One important graph analytics is to explore cohesive subgraphs in a large graph. Among several cohesive subgraphs studied, k-core is one that can be computed in linear time for a static graph. Since graphs are evolving in real applications, in this paper, we study core maintenance which is to reduce the computational cost to compute k-cores for a graph when graphs are updated from time to time dynamically. We identify drawbacks of the existing efficient algorithm, which needs a large search space to find the vertices that need to be updated, and has high overhead to maintain the index built, when a graph is updated. We propose a new order-based approach to maintain an order, called k-order, among vertices, while a graph is updated. Our new algorithm can significantly outperform the state-of-the-art algorithm up to 3 orders of magnitude for the 11 large real graphs tested. We report our findings in this paper

    Analysis on the TGA Model for Stance Detection

    Get PDF
    Stance detection, a problem concerned with finding the stance that an author takes on a specific issue, is a large subset of NLP and A.I, and its uses can already be seen in a multitude of applications. The majority of stance detection machine learning models are tested against a popular dataset called SemEval2016, which is a collection of tweets, authors, topics and stances that were derived from Twitter data and the Twitter API. Many researchers across the globe have created machine learning models to accurately predict the stance of authors based on their tweets regarding a certain topic. However, recently, researchers at Columbia university have created a new dataset called VAST along with a model called Topic-Grouped Attention (TGA), or better known as the TGANet, that claims to perform well on zero-shot and few-shot stance detection, which is a subset of stance detection that focuses on determining the stance of authors on new, never seen topics. Their VAST dataset focuses on this zero-shot and few-shot sub-problem by including a large variety of topics. This VAST dataset has many more topics than traditional stance detection datasets, which often focus on a particular subject to focus their topics around. In this thesis paper, we analyze how the TGA model performs on the SemEval2016 dataset and determine whether the TGA model improves on the current existing zero shot and few-shot stance detection models
    • …
    corecore